287 research outputs found

    February 17th, 2017

    Get PDF
    Since 2006, we have been experiencing two very important developments in computing. One is that tremendous amounts of resources have been invested into innovative applications such as first-principle based models, deep learning and cognitive computing. Many application domains are defying the conventional “it is too expensive” thinking that led to inaccuracies and missed opportunities. The other part is that the industry has been taking a technological path where application performance and power efficiency vary by more than two orders of magnitude depending on their parallelism, heterogeneity, and locality. Today, most of the top supercomputers in the world are heterogeneous parallel computing systems. New standards such as the Heterogeneous Systems Architecture (HSA) are emerging to facilitate software development. Much has been and needs to be learned about of algorithms, languages, compilers and hardware architecture in these movements. What are the applications that continue to drive the technology development? How will we program these systems? How will innovations in memory and storage devices present further opportunities and challenges? What is the impact on long-term software engineering cost on applications? In this talk, I will present some research opportunities and challenges that are brought about by this perfect storm

    Enabling GPU Support for the COMPSs-Mobile Framework

    Get PDF
    Using the GPUs embedded in mobile devices allows for increasing the performance of the applications running on them while reducing the energy consumption of their execution. This article presents a task-based solution for adaptative, collaborative heterogeneous computing on mobile cloud environments. To implement our proposal, we extend the COMPSs-Mobile framework – an implementation of the COMPSs programming model for building mobile applications that offload part of the computation to the Cloud – to support offloading computation to GPUs through OpenCL. To evaluate our solution, we subject the prototype to three benchmark applications representing different application patterns.This work is partially supported by the Joint-Laboratory on Extreme Scale Computing (JLESC), by the European Union through the Horizon 2020 research and innovation programme under contract 687584 (TANGO Project), by the Spanish Goverment (TIN2015-65316-P, BES-2013-067167, EEBB-2016-11272, SEV-2011-00067) and the Generalitat de Catalunya (2014-SGR-1051).Peer ReviewedPostprint (author's final draft

    TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep LearningInference in Function as a Service Environments

    Full text link
    Deep neural networks (DNNs) have become core computation components within low latency Function as a Service (FaaS) prediction pipelines: including image recognition, object detection, natural language processing, speech synthesis, and personalized recommendation pipelines. Cloud computing, as the de-facto backbone of modern computing infrastructure for both enterprise and consumer applications, has to be able to handle user-defined pipelines of diverse DNN inference workloads while maintaining isolation and latency guarantees, and minimizing resource waste. The current solution for guaranteeing isolation within FaaS is suboptimal -- suffering from "cold start" latency. A major cause of such inefficiency is the need to move large amount of model data within and across servers. We propose TrIMS as a novel solution to address these issues. Our proposed solution consists of a persistent model store across the GPU, CPU, local storage, and cloud storage hierarchy, an efficient resource management layer that provides isolation, and a succinct set of application APIs and container technologies for easy and transparent integration with FaaS, Deep Learning (DL) frameworks, and user code. We demonstrate our solution by interfacing TrIMS with the Apache MXNet framework and demonstrate up to 24x speedup in latency for image classification models and up to 210x speedup for large models. We achieve up to 8x system throughput improvement.Comment: In Proceedings CLOUD 201

    A Feature Taxonomy and Survey of Synchronization Primitive Implementations

    Get PDF
    Coordinated Science Laboratory was formerly known as Control Systems LaboratoryNCR Corporatio

    Performance Implications of Synchronization Support for Parallel FORTRAN Programs

    Get PDF
    Coordinated Science Laboratory was formerly known as Control Systems LaboratoryJoint Services Electronics Program / N00014-90-J-1270National Science Foundation / MIP-8809478National Aeronautics and Space Administration / NASA NAG 1-613NCRAMD 29K Advanced Processor Development Divisio
    • …
    corecore